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ABSTRACT 

Gene duplications are a major source of evolution- 
ary innovations. Understanding the functional diver- 
gence of duplicates and their role in genetic 
robustness is an important challenge in biology. 
Previously, analyses of genetic robustness were 
primarily focused on duplicates essentiality and 
epistasis in several laboratory conditions. In this 
study, we use several quantitative data sets to 
understand compensatory interactions between 
Saccharomyces cerevisiae duplicates that are 
likely to be relevant in natural biological popula- 
tions. We find that, owing to their high functional 
load, close duplicates are unlikely to provide sub- 
stantial backup in the context of large natural popu- 
lations. Interestingly, as duplicates diverge from 
each other, their overall functional load is reduced. 
At intermediate divergence distances the quantita- 
tive decrease in fitness due to removal of one dupli- 
cate becomes smaller. At these distances, yeast 
duplicates display more balanced functional loads 
and their transcriptional control becomes signifi- 
cantly more complex. As yeast duplicates diverge 
beyond 70% sequence identity, their ability to 
compensate for each other becomes similar to 
that of random pairs of singletons. 

INTRODUCTION 

Survival of biological systems crucially depends on robust- 
ness to harmful genetic mutations, i.e. genetic robustness, 
and to changes in environmental conditions (1-3). Two 
distinct mechanisms of genetic robustness have been 
previously discussed. First, alternative signaling and meta- 
bolic pathways provide an important mechanism for 
rerouting in many molecular networks (4,5). Second, a 
major role in genetic robustness is attributed to gene 



duphcates (1,6). Gene duplications are frequent in evolu- 
tion and range in size from small-scale (SSD) to whole- 
genome events (WGD) (7,8). While in ~90% of the cases 
one duplicate is eventually lost in evolution (6), duplicated 
genes that remain in the genome can, at least partially, 
backup each other's functions. Importantly, functional 
compensation by duplicates plays a significant role in buf- 
fering deleterious human mutations (9). 

Genetic robustness due to gene duplicates is inherently 
tied to their functional divergence. Duplicates that acquire 
distinct molecular functions (MFs) are naturally unable to 
compensate for one another. In addition, even if MF is 
conserved, incomplete compensation between duplicates is 
possible owing to different expression patterns or dosage 
effects. Gene duplications are the major source of new 
genes (10) and several conceptual models of duplicates' 
evolution have been proposed (11,12). In the neofunctio- 
nalization model one duplicate gains new functions, i.e. 
functions not associated with the ancestral gene, while 
the other duplicate retains the ancestral functions 
(10,13,14). In contrast, in the subfunctionalization model 
both duplicates become indispensable and are retained in 
evolution by partitioning the ancestral gene functions 
(15,16). Both these models imply an eventual loss of the 
abihty of duplicates to fully substitute for each other. It is 
also likely that a significant fraction of duplicates are fixed 
and retained in genomes owing to selective advantages, 
such as dosage effects or condition-specific expression 
patterns, present from the moment of duplication 
(17,18). In cases of fixation due to a selective advantage, 
full compensation between duplicates is unlikely. 

Even though full compensation between duplicates is 
not expected in the long term, the ability of duplicates 
to buffer deleterious mutations of their paralogs has 
been now demonstrated by several independent observa- 
tions. These include a lower than expected fraction of 
essential genes with close duplicates (1), a paucity of 
pairwise epistatic interactions involving duplicated genes 
(19), and an excess of aggravating genetic interactions 
between paralogs (20,21). The contribution of duplicates 
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to robustness has been primarily considered in the context 
of quahtative or quantitative growth phenotypes either in 
nutrient rich or in a small number of laboratory condi- 
tions (1,22,23). Although popular in experiments, these 
conditions are unlikely to approximate well a natural 
'miheu' of living systems, which are constantly bombarded 
by a diverse array of environmental stresses and stimuli. 
Perhaps more importantly, even if there is a strong com- 
pensatory interaction between a pair of duplicates, an evo- 
lutionary relevant decrease in fitness can still persist — due 
to an incomplete buffering — after a damaging mutation in 
one of the duplicates (24). In the context of long-term 
evolution, there may not be much difference between mu- 
tations leading to the lethal phenotype and mutations 
associated with a fitness decrease substantially larger 
than the inverse of the effective population size (25,26). 
Given that typical population sizes of free-living microbial 
species are large (>10*-10**) (27), even a small fitness 
decrease can be effectively lethal for these organisms. 
Consequently, quantitative analyses of growth pheno- 
types, preferably in multiple environmental conditions, 
are necessary to understand the extent to which compen- 
sation between duplicates plays an important role in 
natural biological populations. Here we perform such an 
analysis and show that in the context of natural popula- 
tions, genetic buffering mediated by duplicates is likely to 
be rare and, surprisingly, it is not a monotonic function of 
duphcates' divergence. 



MATERIALS AND METHODS 

Gene and protein sequences for Saccharomyces cerevisiae, 
Saccharomyces paradoxus, Saccharomyces bayanus, 
Saccharomyces castelli, Saccharomyces mikatae, 
Saccharomyces kudriavzevii and Saccharomyces kluyveri 
were obtained from the Saccharomyces Genome 
Database (SGD; http://downloads.yeastgenome.org/) 
and the study by Kellis et al. (28). Pairs of gene duplicates 
were identified by sequence homology between proteins 
within each genome using BLASTP (29). Only duplicates 
that were bidirectional best hits and could be aligned by 
>80% of each open reading frame's sequence length were 
considered in our analysis (30). Following previous studies 
(1), we excluded ribosonial genes from the analysis owing 
to their high expression, dominant impact on growth and 
strong codon adaptation bias. Evolutionary distances 
between duplicated genes were estimated using the 
method of Yang and Nielsen (31) implemented in the 
PAML package (32); the use of other methods, such as 
maximum likelihood, to estimate Ka and Ks did not sig- 
nificantly change the observed patterns (Supplementary 
Figure SI A). 

We used the data obtained by Hillenmeyer et al. (33) to 
measure the fitness contribution of duplicates across 
multiple environmental conditions and chemical perturb- 
ations. Using a /"-value cutoff of 0.01, we obtained the 
number of experimental conditions for which a growth 
defect was observed for every single gene deletion 
mutant. We also analyzed quantitative growth measure- 
ments for double and single deletion yeast strains obtained 



from DeLuna et al. (34) and Costanzo et al. (35). Gene 
essentiahty data was obtained from the study of Giaever 
et al. (36). 

To functionally characterize duplicated genes. Gene 
Ontology (GO) (37) annotations were collected from 
SGD and Enzyme Commission (EC) annotations from 
the Comprehensive Yeast Genome Database (CYGD) 
(38). Transcription factor binding motifs used in our 
work were compiled from Kafri et al. (39) and the high- 
confidence predictions in Kelhs et al. (28). We used 
protein localization data from Huh et al. (40), Codon 
Adaptation Index (CAl) calculations based on the data 
set by Lu et al. (41) and the annotation of protein 
complexes in CYGD. 



RESULTS 

Hillenmeyer et al. (33) quantified growth phenotypes of 
single-gene yeast deletion strains in a large collection of 
environmental conditions. The assembled data set 
contains ~5.5 million phenotypes of heterozygous and 
homozygous mutants in ~400 conditions. The sampled 
conditions represent 27 different environmental stresses 
and hundreds of perturbations with diverse chemical com- 
pounds. Environmental stresses comprised different 
growth media, media lacking specific vitamins or amino 
acids, as well as different pH and temperature regimes. 
This comprehensive collection of phenotypes allowed us 
to investigate in detail the diversification of duplicates' 
functions and their contribution to genetic robustness in 
multiple conditions. 

We first investigated how the average number of sensi- 
tive conditions, i.e. conditions with a significant growth 
decrease due to deletion of one duplicate, depends on 
sequence divergence (Ka) between the duplicated genes 
(Figure lA and B). We considered the fraction of different 
conditions with a growth phenotype as a quantitative 
measure of compensation capacity for duplicates at 
various divergence distances. For close duplicates the 
average number of sensitive conditions is not significantly 
different from that of a random pair of yeast singletons 
(Figure IB, horizontal fine). Importantly, this result does 
not imply that random gene pairs and close duplicates are 
equivalent in terms of the similarity of their MF. As we 
demonstrate below, the observed pattern is likely due to a 
higher overall functional load of close duplicates. Here 
and throughout the article we use the term 'functional 
load' of a gene to characterize the average fitness de- 
crease — across considered conditions — due to the gene 
deletion; we note that, based on the definition above, the 
functional load is not a measure of the total number of 
MFs a gene has, but it reflects the gene's overall fitness 
contribution. 

Interestingly, the number of sensitive conditions initially 
drops as duplicates diverge, decreasing about 30% at the 
distances corresponding to Ka « 0.1 (Ks x I, see 
Supplementary Figure S2A and B). As duphcates 
diverge further, the average number of sensitive conditions 
increases again, reaching the average for a random pair 
of yeast singletons at Ka 0.25. The trend shown in 
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Figure 1. Compensation patterns between yeast duplicates as a function of their evolutionary divergence, Ka, the number of nonsynonymous 
substitutions per site. (A) Scatterplot of the fraction of sensitive conditions, i.e. conditions with detectable growth phenotypes resulting from 
duplicate gene deletion, versus Ka. Each dot in the figure represents a pair of yeast duplicates. (B) The average fraction of sensitive conditions 
per duplicate pair. The /"-value was calculated using the Mann-Whitney U test. The horizontal lines in A and B indicate the average fraction of 
sensitive conditions for a random pair of yeast singletons. (C) The average fraction of essential duplicates, i.e. duplicates with a lethal phenotype on 
deletion, as a function of Ka. The horizontal line indicates the fraction of essential yeast singletons. Gene essentiality data were obtained from the 
Saccharomyces Genome Deletion Project (36). (D) Fraction of conditions with a significant growth decrease for deletion of yeast duplicates arising 
from small-scale (SSD) and whole-genome duplications (WGD). The duplicates were classified as SSD or WGD based on the study by Kellis el al. 
(8) The horizontal line shows the average fraction of sensitive conditions for a random pair of yeast singletons. In the figures, error bars represent the 
standard error of the mean (SEM). 



Figure IB is not sensitive to the P- value cutoff used to 
determine the significance of the growth decrease observed 
in mutant strains (Supplementary Figure S3). A similar 
trend was also observed for the average growth decrease 
(functional load), ineasured either by log ratios or 
Z-scores across all tested conditions (Supplementary 
Figure S4A and B). Bin-free analyses of the data 
(Supplementary Figures SIB and C and S2B) also 
revealed a smaller fitness cost due to the loss of duplicates 
at intermediate distances (Ka « 0.1). 

Because most actively growing wild-type yeast popula- 
tions are diploid (42), we mainly focused our analysis on 
heterozygous mutant strains. The patterns of functional 



coinpensation for heterozygous and homozygous 
mutants are similar when multiple-drug resistance genes, 
as defined by Hillenmeyer et al. (33), are not considered 
(Supplementary Figure S4C). The trends also remain 
similar when only environmental perturbations are 
analyzed in the homozygous experiments (Supplementary 
Figure S4D). We also checked that the observed compen- 
sation patterns due to closest duplicates are not signifi- 
cantly influenced by additional, i.e. more diverged, 
paralogs (Supplementary Figure S4E). This lack of signifi- 
cant compensation by diverged duplicates results in an 
approximately linear relationship between the number of 
sensitive conditions per yeast protein family and the 
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family size (Supplementary Figure S5). Finally, the 
observed compensation patterns were not affected by 
removal of gene pairs with a high CAI (Supplementary 
Figure S6A), suggesting that the observed trend cannot 
be explained by expression-based constraints on the rate 
of duplicate sequence evolution (Ka) (43) or high expres- 
sion levels of certain duplicates. 

It is interesting to compare the ability of dupUcates to 
buffer mutations leading to any detectable growth 
decrease beyond a given fitness threshold (Figure IB) 
and their role in protecting against the no-growth pheno- 
type, i.e. the hkelihood to observe essential genes in duph- 
cate pairs. In Figure IC, using data from the study by 
Giaever et al. (36), we show the fraction of essential du- 
plicates as a function of their divergence. In agreement 
with previous studies (1,22,23) we found that the 
fraction of essential genes remains low and approximately 
constant for close duphcates, and increases substantially 
only at divergence distances corresponding to Ka > 0.4. 
Notably, this pattern is qualitatively different from the 
compensation for quantitative growth phenotypes 
(Figure IB), demonstrating the aforementioned impact 
of using quantitative phenotypes to assess the evolution- 
arily relevant consequences of mutations. Also in contrast 
to patterns obtained in studies based on essential genes 
(22), we observed similar compensation profiles for gene 
pairs originating from small-scale and genome-wide duph- 
cations (Figure ID). Because all WGD duplicates have the 
same age, this result suggests that the abihty of duplicates 
to buffer each other's function across multiple conditions 
depends more strongly on their sequence divergence than 
on the time since duplication. 

It is hkely that the observed decrease in the number of 
sensitive conditions at intermediate divergence distances 
(Ka«0.1) is due to a decrease of the functional load 
carried at these distances by the union of duplicate 
genes. To explore this possibihty, we considered the quan- 
titative fitness data from DeLuna et al. (34) and the syn- 
thetic genetic array (SGA) data from Costanzo et al. (35). 
In these studies, the authors performed quantitative 
growth measurements of yeast strains with individual 
and simultaneous deletions of duplicates. Using the 
single deletion phenotypes from the DeLuna et al. 
(Figure 2A) and Costanzo et al. studies (Figure 2B), we 
observed fitness profiles similar to the one obtained based 
on the data from Hillenmeyer et al. (Figure IB) as a 
function of Ka, with smaller phenotypic effects at inter- 
mediate distances. Interestingly, the overall functional 
load of duplicate pairs, measured by the phenotype of 
double deletions, indeed substantially decreases with 
their divergence (Figure 2C and D). This result suggests 
that while close duplicates are more likely to have similar 
functions, their higher functional load makes complete 
compensation less likely. Because the overall functional 
load of duplicates remains approximately constant for 
Ka > 0.15, the higher fraction of detectable growth pheno- 
types at these distances is likely due to a decreased abihty 
for functional compensation as duplicates diverge. 
Compensation between duplicates quantified by the 
presence of aggravating interactions between duplicate 



pairs decreases as a function of sequence divergence 
(Figure 2E and F) [see (21)]. 

Besides a smaller overall functional load, it is possible 
that duplicates at intermediate distances have other 
properties that favor genetic robustness. To explore this 
possibihty, for each duplicate pair, we looked at the gene 
with the largest and the gene with the smallest number of 
sensitive conditions (Figure 3A). Notably, while the duph- 
cate with more conditions (Figure 3A, more sensitive 
duplicate) follows the average trend for all duplicates 
(Figure IB), the duphcate with fewer conditions (Figure 
3A, less sensitive duplicate) shows a steady gain in the 
number of conditions as a function of Ka. Consequently, 
the functional load of close duplicates, measured by the 
number of sensitive conditions, is different, and this differ- 
ence becomes significantly smaller as the genes diverge 
(Figure 3B. Pearson's r = —0.64, P = 7 x 10""^, see also 
Supplementary Figure S2C). Close duplicates with the 
larger number of sensitive conditions also show a higher 
evolutionary constraint, evaluated by the normalized ratio 
of nonsynonymous to synonymous substitutions per nucleo- 
tide site, Ka/Ks (Wilcoxon Signed Rank test P = 7 x 10~^, 
Figure 2C). This result agrees with previous reports of 
asymmetric evolution of duplicates in the context of co-ex- 
pression, genetic interaction and protein-protein interaction 
networks (19,44,45). The observed asymmetry in the func- 
tional load between close dupHcates can make buffering 
difficult. For example, if the less sensitive duplicate is 
expressed only under specific environmental conditions. 

To further explore the mechanism behind the observed 
backup patterns, we analyzed the functional diversification 
of yeast duplicates as a function of their sequence diver- 
gence (Ka). First, for genes encoding metabolic enzymes 
we calculated the fraction of gene pairs with conserved EC 
numbers (Figure 4A); the conservation of EC numbers 
indicates that corresponding proteins catalyze identical 
biochemical reactions. Second, we calculated the fraction 
of shared GO terms describing protein MF for aU duph- 
cates (Figure 4B). Both measures showed that the MF of 
yeast duplicates typically starts to substantially diverge 
only at about Ka > 0.4. The timing of this divergence ap- 
proximately coincides with a significant increase in the 
fraction of essential duplicates (Figure IC). On the other 
hand, the significant changes in the number of quantitative 
growth phenotypes are observed when the MF of dupli- 
cates is usually still conserved. 

A complementary analysis of transcription factor 
binding sites suggests that gene regulation plays an import- 
ant role in establishing the observed compensation 
patterns. It was previously demonstrated that duplicated 
yeast genes have, on average, a higher number of cis-regu- 
latory motifs than singleton genes (46). Using a compre- 
hensive data set of ~150 known and predicted DNA 
binding motifs in yeast (28,39), we found that the average 
number of different motifs regulating a duplicate pair in- 
creases significantly at Ka x O.l (Figure 4D, dashed Hne, 
Mann-Whitney U test, P = 0.06). At this divergence 
distance, the average number of different motifs per dupli- 
cate pair is more than twice the number of motifs for a pair 
of yeast singletons (Figure 4D, dashed horizontal line). 
The number of regulatory motifs increases both for the 
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Figure 2. Growth phenotypes for individual and simultaneous deletion of duplicates as a function of their sequence divergence (Ka). The results in 
the first column (A, C, E) are based on the competition experiments by DeLuna et al. (34), and in the second column (B, D, F) on the synthetic 
genetic arrays (SGA) by Costanzo et al. (35). (A, B) Fractions of single duplicate deletions with a significant growth decrease. (C, D) Fractions of 
simultaneous (double) duplicate deletions with a significant growth decrease. Due to different measurement sensitivities of the two studies, different 
cutoffs were used to determine a significant growth decrease: 1% for DeLuna ct al. (A, C) and 10% for Costanzo el al. (B, D); the presented results 
are not sensitive to the exact cutoff values (see Supplementary Figure S7). /"-values were obtained using Fisher's exact test. (E, F) Fraction of 
paralogs with a significant negative epistatic interaction from the studies of DeLuna et al. and Costanzo et al, respectively. In the figures error bars 
represent the SEM. 



duplicate with the highest and the duplicate with the 
smallest number of sensitive conditions (Supplementary 
Figure S8A and B). The increase in complexity of the du- 
plicates regulation at Ka « 0.1 is also confirmed by a sig- 
nificant increase (Mann-Whitney U test, P = \ x 10~^) at 
these distances of the number of transcription factor 
mutants (47) affecting duplicate gene expression (Figure 
4D, solid line). 

While the total number of DNA motifs regulating du- 
plicates initially increases with divergence, the fraction of 



shared motifs [Supplementary Figure S9A, see also (48)], 
the overlap in GO terms describing biological processes 
(Figure 4C) and the overlap in cellular localization 
observed in fluorescence-tagging experiments (40) 
decrease (Supplementary Figure S8B). Such a pattern 
suggests that the increase in regulatory complexity 
allows duplicates to specialize for different biological 
processes while mostly preserving common MFs. The 
abihty of duplicates with partially diverged regulatory 
regions to compensate for each other through expression 
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Figure 3. Differences in the number of sensitive conditions between duplicates. (A) The average fraction of sensitive conditions for the duplicates 
with the higher and lower number of sensitive conditions in each pair; Ka values represent sequence divergence between duplicates. The /"-value is for 
the Mann-Whitney U test. (B) The relative difference in the number of sensitive conditions between duplicates as a function of their initial 
divergence; Ka values represent sequence divergence between duplicates. The relative difference was calculated as the absolute difference in the 
number of sensitive conditions between duplicates normalized to the total number of sensitive conditions for the pair (Spearman's /• = —0.60, 
i" = 2 X 10^'; Pearson's r = —0.64, P ^ 7 x 10^"*). (C) The average Ka/Ks ratio for the paralogs with the largest (more sensitive) and smallest 
(less sensitive) number of conditions with a significant growth decrease. Ka/Ks ratios were calculated relative to orthologous sequences in S. hayanus. 
Only duplicates with Ka<0.15 to each other were considered. The /'-value is for the Wilcoxon signed rank test. 



changes of the intact gene was previously described by 
Kafri et al. (39,49). Also, the recent study by DeLuna 
et al. (50) showed that on deletion of one duplicate, ex- 
pression changes of the remaining paralog are often need- 
based, i.e. they happen primarily when the corresponding 
function is required. Such regulatory backup circuits 
should, at least in some cases, enable functional compen- 
sation between honiologs with different expression 
patterns in wild type. Notably, based on the data from 
recent study by Springer et al. (51), who measured the 
expression changes of yeast genes when one of two 
genomic copies was deleted in diploid cells, we observed 
a significant dosage response only for genes forming 
recently duplicated pairs (Ka<0.15, Figure 4E). This 
suggests that genes with close duplicates are most respon- 
sive to dosage effects. 



Finally, the patterns of diversification and functional 
compensation described above should correlate with the 
process of duplicate loss in evolution. We investigated the 
retention of yeast duplicates using the complete genomic 
sequences of seven species: 5". cerevisiae, S. paradoxus, 
S. bayanus, S. castelli, S. mikatae, S. kudriavzevii and 
5". kluyveri. We calculated the number of remaining dupli- 
cates as a function of their sequence divergence (Figure 5, 
see also Supplementary Figure SI OA for the correspond- 
ing relationships in individual yeast species). This analysis 
suggests that a relatively brief initial period of high dupli- 
cate loss (6) is followed by a long evolutionary period 
(Ka>0.1) during which the average loss rate decreases 
> 10-fold (red in Figure 5). Interestingly, the loss rate sig- 
nificantly decreases approximately at the divergence 
distance when duplicates become more similar in terms 
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Figure 4. Diversification of duplicates function and regulation. (A) Fraction of metabolic duplicates sharing the same EC numbers; conservation of 
EC numbers indicates catalysis of identical biochemical reactions. (B) Fraction of GO MF terms shared between duplicates. (C) Fraction of GO 
Biological Process (BP) terms shared between duplicates. In panels B and C we considered only GO terms with a distance of three or more to the 
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of their functional load (Figure 2B) and when their regu- 
latory complexity significantly increases (Figure 5D). It is 
hkely that the duplicates surviving the initial loss stage 
develop independent functionalities and are preserved 
for long times in the genomes of yeast species. 



DISCUSSION 

In the present study, we analyzed genetic robustness due 
to duplicates in the context of quantitative growth pheno- 
types and sensitivities to gene deletions in multiple envir- 
onmental conditions. Such robustness is important for 
understanding the buffering of deleterious mutations in 
large natural biological populations. Our results demon- 
strate that, contrary to commonly held view, close gene 
duplicates are unlikely to provide a high level of backup in 
the context of large natural populations. Consequently, it 
is unlikely that many duplicates are fixed in natural popu- 
lations specifically due to selection for robustness. 

Our analysis also suggests that duplicate redundancies 
described in genomics databases, and frequently observed 
in laboratory experiments, should be considered with 
caution, at least with respect to their functions in 
natural biological populations. To investigate this point 
further, we analyzed a set, compiled by Kafri et al. (52), 
of 112 yeast duplicates reported to be at least partially 
redundant in research publications. These duphcates 
have been described as redundant based on their func- 
tional overlap and compensatory interactions observed 
in smaU-scale experimental studies. Interestingly, based 
on the number of conditions with quantitative growth 
phenotypes from the study by Hillenmeyer et al. (33), 
and the quantitative growth measurements by Costanzo 



et al. (35), the duplicates annotated as redundant are not 
significantly different from all other yeast duplicates 
(Mann-Whitney U, ^ = 0.13 and 0.35, respectively, 
Supplementary Figure Sll). This demonstrates that, 
although many yeast duplicates indeed may show func- 
tional overlap in some laboratory conditions, their com- 
pensation properties will probably be significantly less 
important in large natural populations due to the ability 
of purifying selection to efficiently prune mutations 
causing even a small fitness decrease. 

It is Hkely that several different factors contribute to the 
relative paucity of functional compensation between 
paralogs at small divergence distances. A significant 
fraction of duplications are hkely to be fixed owing to 
dosage effects (17), and functional compensation 
between such duplicates in the context of natural popula- 
tions is unlikely. For example, the lack of significant com- 
pensation between histone pairs, HTA1-HTA2 and 
HHT1-HHT2, is likely to be a consequence of their role 
in maintaining proper histone levels in yeast cells. Gene 
dosage may explain the inability of some duplicates to 
backup each other, but it is unhkely to be the only explan- 
ation. We showed that even when all duplicate pairs with a 
high CAI (Supplementary Figure S6A) or pairs forming 
known protein complexes (Supplementary Figure S6B) are 
removed from the analysis, the patterns of functional com- 
pensation remain similar. Notably, genes with a high CAI 
have been also associated with higher frequencies of 
interlocus gene conversion (IGC) (53,54). While IGC 
can slow down the rate of duplicates sequence divergence 
(55), analyses based only on WGDs with no evidence of 
IGC [using data recently reported by Casola et al. (56)] 
revealed essentially identical compensation patterns 
(Supplementary Figure SI 2). 

Close duplicates are also less likely to compensate for 
each other probably owing to the aforementioned dichot- 
omy in their functional loads (Figure 3A and B). Many 
close duplicates can be classified, based on their activity 
and breadth of expression, into a major and a minor func- 
tional isoforms. For example, the glyceraldehyde-3-phos- 
phate dehydrogenase TDHl is active under various stress 
conditions, while its isoenzyme TDH2 is used primarily 
during exponential growth (57). Similarly, the ubiquitin 
conjugating enzyme UBC4 is expressed during exponen- 
tial growth, while its duplicate UBC5 is active during sta- 
tionary phase (58). The difference in functional load 
for close yeast duplicates is also consistent with the 
asymmetric partition of functions, interactions and gene 
expression, observed between close duplicates in other or- 
ganisms, for example, Arabidopsis and Human (45,59,60). 
This suggests that duplicate-dependent compensation in 
the context of natural populations may be limited in 
other species as well. 

Our analysis suggests that a typical hfecycle of gene 
duplicates in yeast consists of several distinct evolutionary 
stages (11,12). In the first stage (at duplicate distances cor- 
responding to Ka < 0.05), duplicates tend to have high 
overall functional loads and significant asymmetry in the 
number of sensitive conditions; both of these factors make 
complete compensation unlikely. The high functional load 
of close duphcates suggests that adaptive selection plays 
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an important role in their fixation. In the second stage 
(0.05 < Ka < 0.25), as duplicates diverge further, their 
overall functional load usually decreases. This may 
happen, for example, due to relaxation of the environmen- 
tal conditions, which facilitated the original duplicate 
fixation. The vast majority of duplicates, likely the 
paralogs with relatively smaller functional loads 
(Figure 3C), are lost at this stage (Figure 5). Gene pairs 
that survive the period of high duplicate loss display more 
balanced functional loads and complex regulation; these 
gene pairs are usually retained for long evolutionary times 
in yeast genomes (Figure 5). Surviving duphcates can 
provide at least partial compensation at intermediate di- 
vergence distances and also serve as an important source 
of new protein functions. In the third stage (Ka > 0.3 or 
~70% sequence identity), the lifecycle of duplicates is 
completed when their functional roles diverge, and their 
quantitative compensation properties become indistin- 
guishable from those of random pairs of yeast singletons. 

SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Onhne. 
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